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Abstract 

Given a connected graph G on n vertices and a positive integer k < n, a subgraph of G on k vertices is 
called a fc-subgraph in G. We design combinatorial approximation algorithms for finding a connected k- 
subgraph in G such that its density is at least a factor fl(max{n -2 ^ 5 , fc 2 /n 2 }) of the density of the densest 
fc-subgraph in G (which is not necessarily connected). These particularly provide the first non-trivial 
approximations for the densest connected fc-subgrapli problem on general graphs. 
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1 Introduction 

Let G = (V) E) be a connected simple undirected graph with n vertices, m edges, and nonnegative edge 
weights. The ( weighted ) density of G is defined as its average (weighted) degree. Let k < n be a positive 
integer. A subgraph of G is called a k-subgraph if it has exactly k vertices. The densest k-subgraph problem 
(DfcSP) is to find a fc-subgraph of G that has the maximum density, equivalently, a maximum number of edges. 
If the fc-subgraph requires to be connected, then the problem is referred as to the densest connected k-subgraph 
problem (DCfcSP). Both DfcSP and DCfcSP have their weighted generalizations, denoted respectively as 
HfcSP and HCfcSP, which ask for a heaviest (connected) fc-subgraph, i.e., a (connected) fc-subgraph with a 
maximum total edge weight. Identifying fc-subgraphs with high densities is a useful primitive, which arises 
in diverse applications - from social networks, to protein interaction graphs, to the world wide web, etc. 
While dense subgraphs can give valuable information about interactions in these networks, the additional 
connectivity requirement turns out to be natural in various scenarios. One of typical examples is searching 
for a large community. If most vertices belong to a dense connected subnetwork, only a few selected inter-hub 
links are needed to have a short average distance between any two arbitrary vertices in the entire network. 
Commercial airlines employ this hub-based routing scheme }22l . 

Related work. An easy reduction from the maximum clique problem shows that DfcSP, DCfcSP and their 
weighted generalizations are all NP-hard in general. The NP-hardness remains even for some very restricted 
graph classes such as chordal graphs, triangle-free graphs, comparability graphs J9] and bipartite graphs of 
maximum degree three [141 . 

Most literature on finding dense subgraphs focus on the versions without requiring subgraphs to be 
connected. For DfcSP and its generalization HfcSP, narrowing the large gap between the lower and upper 
bounds on the approximabilty is an important open problem. On the negative side, the decision problem 
version of DfcSP, in which one is asked if there is a fc-subgraph with more than h edges, is NP-complete even 
if h is restricted by h < k 1+e [3], Feige El showed that computing a (1 + ^-approximation for D/cSP is 
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at least as hard as refuting random 3-SAT clauses for some e > 0. Khot [T5] showed that there does not 
exist any polynomial time approximation scheme (PTAS) for DfcSP assuming NP does not have randomized 
algorithms that run in sub-exponential time. Recently, constant factor approximations in polynomial time for 
DfcSP have been ruled out by Raghavendra and Steurel [26] under Unique Games with Small Set Expansion 
conjecture, and by Alon et al. [Tj under certain “average case” hardness assumptions. On the positive 
side, considerable efforts have been devoted to finding good quality approximations for HfcSP. Improving the 
O(n 0 ' 3885 ^approximation of Kortsarz and Peleg [20], Feige et al. [13] proposed a combinatorial algorithm 
with approximation ratio 0(n s ) for some J < 1/3. The latest algorithm of Bhaskara et al. [B] provides an 
0(n 1 / 4+e )-approximation in time. If allowed to run for n°^ oen ^ time, their algorithm guarantees an 

approximation ratio of C^n 1 / 4 ). The O(n/fc)-approximation algorithm by Asahiro et al. [5] is remarkable 
for its simple greedy removal method. Linear and semidefinite programming (SDP) relaxation approaches 
have been adopted in mmm to design randomized rounding algorithms, where Feige and Langberg [12] 
obtained an approximation ratio somewhat better than n/k , while the algorithms of Srivastav and Wolf [28] 
and Han et al. |TB] outperform this ratio for a range of values fc = 0(n). On the other hand, the SDP 
relaxation methods have a limit of for DfcSP as shown by Feige and Seltser JT4] and Bhaskara et al. [7]. 

For some special cases in terms of graph classes, values of fc and optimal objective values, better approx¬ 
imations have been obtained for DfcSP and HfcSP. Arora et al. [3] gave a PTAS for the restricted DfcSP 
where m = H(n 2 ) and fc = fl(n), or each vertex of G has degree H(n). Kortsarz and Peleg [20] approximated 
DfcSP with ratio 0((n/k) 2 ^ 3 ) when the number of edges in the optimal solution is larger than 2y / fc 5 /n. 
Demaine et al. [101 developed a 2-approximation algorithm for DfcSP on H-minor-free graphs, where H is 
any given fixed undirected graph. Chen et al. [8j showed that DfcSP on a large family of intersection graphs, 
including chordal graphs, circular-arc graphs and claw-free graphs, admits constant factor approximations. 
Several PTAS have been designed for DfcSP on unit disk graphs [8], interval graphs [25], and a subclass of 
chordal graphs [24] . 

The work on approximating densest/heaviest connected fc-subgraphs are relatively very limited. To the 
best of our knowledge, the existing polynomial time algorithms deal only with special graphical topologies, 
including: (a) 4-approximation [27! and 2-approximation m for the metric HfcSP and HCfcSP, where the 
underlying graph G is complete, and the connectivity is trivial; (b) exact algorithms for HfcSP and HCfcSP 
on trees [9], for DfcSP and DCfcSP on fc-trees, cographs and split graphs [9], and for DCfcSP on interval 
graphs whose clique graphs are simple paths |23j . 

Among the well-known relaxations of DfcSP and HfcSP is the problem of finding a (connected) subgraph 
(without any cardinality constraint) of maximum weighted density. It is strongly polynomial time solvable 
using max-flow based techniques mm- Andersen and Chellapilla [2] and Khuller and Saha m studied 
two relaxed variants of HfcSP for finding a weighted densest subgraph with at least or at most fc vertices. The 
former variant was shown to be NP-hard even in the unweighted case, and admit 2-approximations in the 
weighted setting. The approximation of the latter variant was proved to be as hard as that of DfcSP/HfcSP 
up to a constant factor. 

Our results. Given the interest in finding densest/heaviest connected fc-subgraphs from both the theoreti¬ 
cal and practical point of view, a better understanding of the problems is an important challenge for the field. 
In this paper, we design 0(mn log n) time combinatorial approximation algorithms for finding a connected 
fc-subgraph of G whose density (resp. weighted density) is at least a factor fl(max{n _2,/5 , fc 2 /n 2 }) (resp. 
H(max{n -2 / 3 , fc 2 /n 2 })) of the density (resp. weighted density) of the densest (resp. heaviest) fc-subgraph of 
G which is not necessarily connected. These particularly provide the first non-trivial approximations for the 
densest/heaviest connected fc-subgraph problem on general graphs: C^minln 2 / 5 , n 2 /fc 2 }) for DCfcSP and 
0(min{n 2 / 3 , n 2 /k 2 }) for HCfcSP. 

To evaluate the quality of our algorithms’ performance guarantees 0(n 2 / 5 ) and 0(n 2 / 3 ), which are 
compared with the optimums of DfcSP and HfcSP, we investigate the maximum ratio A (resp. A^,), over all 
graphs G (resp. over all graphs G and all nonnegative edge weights), between the maximum density (resp. 
weighted density) of all fc-subgraphs and that of all connected fc-subgraphs in G. The following examples 
show A > n 1 / 3 /3 and A w > n 1 / 2 /2. 
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Example 1.1. (a) The graph G is formed from £ vertex-disjoint t-cliques L\,..., Lp by adding, for each 

1 = 1, ...,£— 1, a path Pi of length £ 2 + 1 to connect Li and Lj+i, where Pi intersects all the £ cliques only 
at a vertex in Li and a vertex in Li + i. Let k = £ 2 . Note that G has n = £ 2 + £ 2 (£ — 1) = £ 3 vertices. The 
unique densest k-subgraph of G is the disjoint union of L±,L( and has density £ — 1. One of densest 
connected k-subgraphs of G is induced by the £ vertices in L\ and certain £ 2 — £ vertices in P±, and has 
density (£(£ — 1) + 2(£ 2 — £))/£ 2 . Hence A > £ 2 /{£ + 2£) = n 1 / 3 /3. 

(b) The graph G is a tree formed from a star on £ + 1 vertices by dividing each edge into a path of length 
£+1. All pendant edges have weight 1 and other edges have weight 0. Let k = 2£. Note that G has n — £ 2 + 1 
vertices. The unique heaviest k-subgraph of G is induced by the £ pendant edges of G, and has weighted 
density 1. Every heaviest connected k-subgraph of G is a path containing exactly one pendant edge of G, and 
has weighted density \/£. Hence A w > £ > n 1 / 2 /2. 

The remainder of this paper is organized as follows. Section 2 gives notations, definitions and basic 
properties necessary for our discussion. Section 3 is devoted to designing approximation algorithms for 
finding connected dense /c-subgraphs. Section 4 discusses extension to the weighted case, and future research 
directions. 

2 Preliminaries 

Graphs studied in this paper are simple and undirected. For any graph G' = {V',E') and any vertex 
v £ V', we use dc{v) to denote ids degree in G'. The density er(G') of G' refers to its average degree, i.e. 
er(G') = Y^veV' dG'{v)/\V'\ = 2|E , |/|P , |. Following convention, we define |G'| = \V'\. By a component of G' 
we mean a maximal connected subgraph of G'. 

Throughout let G = ( V,E) be a connected graph on n vertices and to edges, and let k £ [3,n] be an 
integer. Our goal is to find a connected /c-subgraph G of G such that its density a(C) is as large as possible. 
Let a *(G) and crjJ(G) denote the maximum densities of a subgraph and a ^-subgraph of G, respectively, 
where the subgraphs are not necessarily connected. It is clear that 

a*(G) > a* k (G) and n- 1 > o(G) > k ■ a* k {G)/n. (2.1) 

Let S' be a subset of V or a subgraph of G. We use G[S] to denote the subgraph of G induced by the vertices 
in S, and use G\S to denote the graph obtained from G by removing all vertices in S and their incident 
edges. If S consists of a single vertex v, we write G\v instead of G \ {i>}. 

Lemma 2.1. cr^(G) < <j^_ 1 (G) + 2 and a%(G) < 3 • a k _ 1 (G). 

Proof. The first inequality in the lemma implies the second since cr^_ 1 (G) > 1. To prove (G) < (J k _ 1 (G)+2, 
consider a densest fc-subgraph H of G, and v £ V(H). Then dn{v) < k — 1, and 

> o[H \ v) = > a{H) _ M = a *( G ) - 2, 

establishing the lemma. □ 

The vertices whose removals increase the density of the graph play an important role in our algorithm 
design. 

Definition 2.2. A vertex v € V is called removable in G if cr(G \v) > cr(G). 

Since cr(G \ v) = 2{\E\ — dc{v))/i\V\ — 1), the following is straightforward. It also provides an efficient 
way to identify removable vertices. 

Lemma 2.3. A vertex v € V is removable in G if and only if da(v) <a(G)/2. 

Lemma 2.4. Let G\ be a connected k-subgraph of G. For any connected subgraph Gi of G\, it holds that 

a(G 1 )>a(G 2 )/Vk. 
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Proof. Suppose that G 2 is a ^-subgraph of G with m 2 edges. By the definition of density, er(G 2 ) < k 2 — 1. 
The connectivity of G\ implies |£?(Gi)| > \E{G 2 )\ + |V(Gi \ G 2 )\, and 


c(Gi) > 


2(m 2 + k — /c 2 ) k 2 ' cr(G 2 ) + 2(fc — /C 2 ) 


In case of k 2 > \/fc, we have cr(Gi) > k 2 ■ <j{G 2 )/k > a(G 2 ) / Vk. In case of &2 < \/fc, since k > 3, it follows 
that Gi has no isolated vertices, and cr(Gi) > 1 > k 2 /\fk > cr{G 2 )/Vk. □ 

For a cut-vertex v of G, we use G v to denote a densest component of G \ v, and use G v+ to denote the 
connected subgraph of G induced by V(G v ) U {v}. Note that G\G V is a connected subgraph of G. 

3 Algorithms 

We design an 0(n 2 /k 2 )- approximation algorithm (in Section 3.1) and further an 0(n 2 / 5 )-approximation 
algorithm (in Section 3.2) for D/cSP that always finds a connected fc-subgraph of G. For ease of description 
we assume k is even. The case of odd k can be treated similarly. Alternatively, if k is odd, we can first find a 
connected (k — l)-subgraph Gi satisfying ct^_ 1 (G)/ct(G 1 ) < O(a), where a £ {n 2 /k 2 , n 2 / 5 }; it follows from 
Lemma [2.11 that a/.(G) / a(G\) < O(a). Then we attach an appropriate vertex to Gi, making a connected 
fc-subgraph G 2 with density a(G 2 ) > ^ifj(Gi) > |cr(Gi). This guarantees that the approximation ratio is 
still <t*(G)/ct(G 2 ) < 0(a). 

3.1 O(n 2 //c 2 )-approximation 

We first give an outline of our algorithm (see Algorithm |TJ) for finding a connected ^-subgraph G of G with 
density tx(G) > Q(k 2 /n 2 ) ■ cr^G) (see Theorem 13.31) . 

Outline. We start with a connected graph G' •<— G and repeatedly delete removable vertices from G' to 
increase its density without destroying its connectivity. 

• If we can reach G' with \G'\ = k in this way, we output G as the resulting G'. 

• If we can find a removable cut-vertex r in G' such that \G' r \ > k, then we recurse with G' •<— G' r . 

• If we stop at a G' without any removable vertices, then we construct G from an arbitrary connected 
(fc/2)-subgraph by greedily attaching k/2 more vertices (see Procedure [T]). 

• If we are in none of the above three cases, we find a connected subgraph of G' induced by a set S of 
at most k/2 vertices, and then expand the subgraph in two ways: (1) attaching G' r for all removable 
vertices r of G' which are contained in S, and (2) greedily attaching no more than k/2 vertices. From the 
resulting connected subgraphs, we choose the one that has more edges (breaking ties arbitrarily), and 
further expand it to be a connected fc-subgraph (see Procedure [2J , which is returned as the output G. 


Greedy attachment. We describe how the greedy attaching mentioned in the above outline proceeds. 
Let S and T be disjoint nonempty vertex subsets (or subgraphs) of G. Note that 1 < IS) < n. The set of 
edges of G with one end in S and the other in T is written as [S, T], For any positive integer j < n — |S|, a 
set S* of j vertices in G\S with maximum \ [S, S*]| can be found greedily by sorting the vertices in G\S as 
Vi,v 2 ,..., Vj,... in a non-increasing order of the number of neighbors they have in S. For each i = 1,... ,j, 
it can be guaranteed that has either a neighbor in S or a neighbor in {ui,... in the latter case 

i> 2. Setting S* = {vi,v 2 ,..., Vj}. It is easy to see that 


|[S,S*]|>i-|[S,G\S]|. 


(3.1) 


Moreover, if G[S] is connected, the choices of vfs guarantee that G[SUS*] is connected. We refer to this S* 
as a j-attachment of S in G. Given S, finding a j-attachment of S takes 0(m + nlogn) time, which implies 
the following procedure runs in 0(|G(G , )| + |G'| • log |G'|) time. 
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Procedure 1. Input: a connected graph G' without removable vertices, where \G'\ > fc. 
Output: a connected fc-subgraph of G 7 , written as PrcI(G'). 


1. Gi = (Vi, .Ei) <S— an arbitrary connected (fc/2)-subgraph of G' 

2. Vj* <— a (fc/2)-attachment of Vi in G' 

3. Output Prc1(G / ) <— G[Vi U V*] 

Note that the definition of attachment guarantees that V\ fl Vi* = 0, |[Vi, Vi*]| is maximum, and G\V\ U V*] 
is connected. 

Lemma 3.1. er(PRCl(G 7 )) > ^tj • er(G 7 ). 

Proof. Since G' has no removable vertices, we deduce from Lemma [2731 that every vertex of G' has degree at 
least cr(G / )/2. Therefore |[Gi, G' \ Gi]| > | • ^ — 2|Ei|. Recalling (13.11) . we see that the number of edges 

in PrcI(G') is at least |[Vi, Vj*]| > ^ — 2|E!|) • + \Ei\ > • er(G'), proving the lemma. □ 

Procedure 2. Input: a connected graph G' with |G'| > k, where every removable vertex r is a cut-vertex 
satisfying \G' r \ < k. Output: a connected fc-subgraph of G', written as Prc2(G'). 

1. H <r- G 7 , R! -s— R = the set of removable vertices of G' 

2. While R! ^ 0 do 

3. Take r £ Ft' 

4. H <- H\V(G' r ), R! <- R! \ V(G' r+ ) 

5. End-While 

6. For each v £ V(H), define 9(v) — \G' v+ \ if v £ R, and 6{v) = 1 otherwise 

7. Let S be a minimal subset of V(H) s.t. H[S} is connected & J2ves ®( * (i) (ii) * * v )— I 

8. Let S* be a min{fc/2, \H\ 5|}-attachment of S in H 

9. Vi -s— S U (U rg i{nsR(G(,)), V 2 S U S* 

10. Let H' be one of G'[Vi] and G'fVi] whichever has more edges (break ties arbitrarily) 

11. Expand H' to be a connected fc-subgraph of G' 

12. Output PRC2(G') ^ H' 


Under the condition that the resulting graph is connected, the expansion in Step |TT] can be done in an 
arbitrary way. It is easy to see that Procedure [2] runs in OdG'] • ^(G 7 )!) time. 

Lemma 3.2. At the end of the while-loop (Step\5 [1 in Procedure [H we have 

(i) H is a connected subgraph of G'. 

(ii) If H contains two distinct vertices r and s that are removable in G', then (by the condition of the 
procedure both r and s are cut-vertices of G', and moreover) G' r and G' s are vertex-disjoint. 

Proof. Note that in every execution of the while-loop, r £ R' is a cut-vertex of H, and V(H) nV ( G ' r ) induces a 
component of H\r. Thus H is connected throughout the procedure. For any two removable vertices r, s of G' 
with \G' r \ < |G(,| and r,s £ V(H), if G' r and G' s are not vertex-disjoint, then V(G' r )U{r} C V(G' S ). It follows 
that all vertices of V(G' r ) U {r} have been removed by StepQ]when considering s £ R', a contradiction. □ 

Observe that for any two distinct r,s £ R, either G' r+ and G' s+ are vertex-disjoint, or G' r+ contains G ' s+ , 
or G' s+ contains G' r+ . This fact, along with an inductive argument, shows that, throughout Procedure^ for 
any s £ R\V(H), there exists at least a vertex r £ V(H) fl R such that G' r+ contains G' s+ , implying that 

(U r£Rn v(H)V(G r+ )) U ( V(H)\R ) = U(G') holds always. By Lemma lR2T iil. in Step [3 we see that V(G') 
is the disjoint union of U(G r+ ), r £ R fl V(H) and V(H)\R , giving Y^vev(H) 0( v ) = 1^1 > fc- Hence, the 
connectivity of H iLemma 13.21 ill implies that the set S at Step [T] does exist. 
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Take u £ S such that u is not a cut-vertex of H. If |5j > (fc/2) + 1, then we have J2veS\{u} @( v ) > 
|S\{it}| > fc/2, a contradiction to the minimality of S. Hence 

\S\ < fc/2. 

Since Step 4 has removed from H all vertices in V{G' r ) for all r £ R, we see that V\ is the disjoint union of 
S and Ur-g^nsH (G' r ) Recall that \G' r \ < fc for all r £ R D S. If |Vi| > fc, then |5| > 2, and either 0 U > fc/2 
or X/«eS\{u} 0( v ) — fc/2, contradicting to the minimality of S. Noting that \Vi\ = J2veS @( v )i we have 

fc/2 < |Vi| < fc- (3.2) 

We deduce that the output of Procedure [2] is indeed a connected fc-subgraph of G'. 

Algorithm 1. Input: connected graph G = (V,E) with \V\ > fc. 

Output: a connected k-subgraph of G, written as AlgI(G). 

1. G'^G 

2. While |G'| > fc and G 1 has a removable vertex r that is not a cut-vertex do 

3. G' <-G'\r 

4. End-WHile // either |G? r | = fc or any removable vertex of G' is a cut-vertex 

5. If |G'| = fc then output AlgI(G) <— G' 

6. If | G' | > fc and G' has no removable vertices 

then output AlgI(G) <— PrcI(G') 

7. If |G'| > fc and |G/| < fc for each removable vertex r of G' 

then output AlgI(G) <— Prc2(G') 

8. If |G'| > fc and \G' r \ > fc for some removable vertex r of G' 

then output AlgI(G) £- Alg1(G/) 

In the while-loop, we repeatedly delete removable non-cut vertices from G' until |G'| = fc or G' has no 
removable non-cut vertex anymore. The deletion process keeps G' connected, and its density <j{G') increasing 
(cf. Definition 12.21) . When the deletion process finishes, there are four possible cases, which are handled by 
Steps [5] El [7] and El respectively. 

• In case of Step El the output G' is clearly a connected fc-subgraph of G. 

• In case of Step El G' qualifies to be an input of Procedure [U With this input, Procedure [T] returns the 
connected fc-subgraph PrcI(G') of G' as the algorithm’s output. 

• In case of Step [3 G' qualifies to be an input of Procedure [3 With this input, Procedure E] returns the 
connected fc-subgraph Prc 2(G') of G' as the algorihtm’s output. 

• In case of StepEl the algorithm recurses with smaller input G' r , which satisfies cr(G' r ) > cr(G') > <r(G) 
and fc < |G;| < |G'| < |G|. 

Hence after 0(n) recursions, the algorithm terminates at one of Steps El El and outputs a connected 
fc-subgraph of G. 

Theorem 3.3. Algorithm^ finds in 0(mn ) time a connected k-subgraph C of G such that ct^(G)/ct(G) < 
12n 2 /fc 2 . 

Proof. Let G = AlcEDG) be the output connected fc-subgraph of G. If G is output at Step El then its 
density is cr(C) > cr(G) > (fc/n) • cr^(G), where the last inequality is by (12.11) . If G is output by Procedure[T| 
at Step El then from Lemma [3TTT1 we know its density is at least ■ cr{G') > ^ ■ cr(G) > ■ <?l(G). 

Now we are only left with the case that G = Prc 2(G') is output by Procedure[2]at StepEJof Algorithm Q] 
Let R denote the set of removable vertices of G'. For every r £ R, we see that r is a cut-vertex of G' (cf. 
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the note at Step[]]of the algorithm), and cr(G' r ) > cr(G' \ r) > a (G 7 ), where the first inequality is from the 
definition of G' r (it is the densest component of G'\r), and the second inequality is due to the removability 
of r. Thus 

a(G' r+ ) > <j{G' r ) ■ |G , r .|/(|G" r .| + 1) > cr(G')/2 for every r £ R. 

Using the notations in Procedure [ 2 j we note that each vertex of S \ R is non-removable in G' , and therefore 
has degree at least <j(G')/2 in G' by Lemma HOI Since V\ = S'U(U r . e ,RnsU(G(,)) = (5'\i?)U(U r6 snijU(G(, + )) 
contains at least k/2 vertices (recall (13.21) 1. it follows that G' contains at least (| • ^ )/2 > | • cr(G) > 

• cr^(G) edges each with at least one end in V\. 

If there are at least ■ al (G) edges with both ends in V\ , then by Step [TO] of Procedure [ 2 ] we have 
\E{C)\ > • crJ(G) and er(G) = 2\E(C)\/k > ■ cr^iG) > • crJ(G). It remains to consider the case 

where G' contains at least ^ • cr^(G) edges between Vj and G' \ V\. All these edges are between S and 
G' \ Vi = H\S, since each edge incident with any vertex in G' r (r £ R) must have both ends in V\. So, 
by the definition of S* at Step [ 8 ] of Procedure [2] we deduce from m that there are at least a number 
|[S, 5*] | > ^ • | [5, H \ 5] | > 2^7 ’ cr 1(G) of edges in the subgraph of G' induced by 14 = S U S*. Hence 
<r(G) > 21[5, S*]\/k > ■ cr^(G), justifying the performance of the algorithm. 

Algorithm [I] runs Procedure [l] or Procedure [2] at most once, which takes 0(mn) time. At least one of 
Procedures [T] and [5] has never been called by the algorithm. Using appropriate data structures and 0(n 2 ) 
time preprocessing, we construct a list L of removable vertices in G' (cf. Lemma \2. 31) . It takes O(m) time 
for Step 2 to determine whether a removable vertex r £ L is a cut-vertex of G', and obtain G' r if it is. If r 
is not a cut-vertex, then we remove r from G' , and update G' and L in 0(n) time. If r is a cut-vertex with 
\G' r \ < k, then r remains a cut-vertex of G' in the subsequent process (note |G'| > k holds always) unless it 
is removed from the graph by certain recursion at Step [ 8 ] so the subsequent while-loops will never consider 
it. If r is a cut-vertex with \G' r \ > k, then we recurse on G' rl and update G' —> G' r and L in 0(\G' r \) = 0(n) 
time, throwing away G' \ G' r which contains r. Overall, the algorithm runs in 0(mn ) time. □ 

3.2 0(n 2 / 5 )-approximation 

In this subsection we design algorithms for finding connected /c-subgraphs of G that jointly provide an 
0(n 2 / 5 )-approximation to DfcSP. Among the outputs of all these algorithms (with input G), we select the 
densest one, denoted as G. Then it can be guaranteed that a^(G)/cr{C) < 0(n 2 ^ 5 ). In view of the 0(n 2 /k 2 )- 
approximation of Algorithm [l] we may focus on the case of k < n 4 / 5 . (Note that n 2 /k 2 < n 2 / 5 if k > n 4 / 5 .) 

Let D be a densest connected subgraph of G, which is computable in time 0(mnlog(n 2 / m)) [T5] [H] 
(because every component of a densest subgraph of G is also a densest subgraph of G). Thus 

a(D)=a*(G)>a* k (G). 

Moreover, the maximality of cr(D) implies that D has no removable vertices. 

Algorithm 2. Input: connected graph G along with its densest connected subgraph D. 

Output: a connected k-subgraph of G, denoted as A i.d2VG). 

1. If \D\ < k then Expand D to be a connected fc-subgraph H of G 

Output Al<^2](G) 4— H 

2. Else Output AlgEJG) «- PrcIHL*) 


Lemma 3.4. If k < n 4 / 5 , then ct(Al(^2[G ! )) > min{fc/(4n), n 2 / 5 } • tr*(G). 

Proof. In case of \D\ < k, by Lemma [2.41 it follows from u*(G) > c7%{G) that the density of the output 
subgraph a(H) > a(D)/\/k = a* (G)/Vk. Since k < n 4 / 5 , we see that <j{H) > n -2 / 5 • a*(G). 

In case of \D\ > k , we deduce from Lemma [Til that the connected fc-subgraph Al(^2](G)=Prc[T[D) of D 
has density at least ■ cr(D) > ^ ■ cr*(G). □ 
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Our next algorithm is simply an expansion of Procedure 2 by Feige et al. m Let Vh be a set of k/2 
vertices of highest degrees in G, and let dh = § Jf,vev h d G(v) denote the average degree of the vertices in 14. 

Algorithm 3. Input: connected graph G with |G| > k. 

Output: a connected k-subgraph of G, denoted as ALd5VG).. 

1. Vjf ■£- a (fc/2)-attachment of 14 in G 

2. H <- a densest component of G[14 U Vf] 

3. Output ALcj3jG) <— a fc-connected subgraph of G that is expanded from H 

In the above algorithm, the subgraph G[14 U 14*] is exactly the output of Procedure 2 in [13], for which 
it has been shown (cf, Lemma 3.2 of [T3] ) that 

a := a(G[Vh U I 4 *]) > kd h /(2n). 

Together with Lemma HOI we have the following result. 

Lemma 3.5. ct(Alc£|IG)) > -J= > & ■ dh- 

Proof. It follows from Lemma [2~T1 that cr(AL<^3fG)) > a(H)/y/k > d/\Jk. □ 

Our last algorithm is a slight modification of Procedure 3 in m, where we link things up via a “hub” 
vertex. For vertices u, v of G, let W(u, v) denote the number of walks of length 2 from u to v in G. 

Algorithm 4. Input: connected graph G — (V,E) with |G| > k. 

Output: a connected k-subgraph of G, denoted as Alc ff^G). 


1 . G e <-G[V\V h ]. 

2. Compute W{u,v) for all pairs of vertices u,v in Ge. 

3. For every v £ V \ 14, construct a connected /c-subgraph C v of G as follows: 

- Sort the vertices u £ V \ 14 \ {n} with positive W(v,u) as vi,V 2 , ■ ■ ■ ,v t such that W(v,v\) > 
W(v , V 2 ) > • • • > W(v, vf) > 0. 

- P v £~ 

^min{i,fc/2— 1} } 

- B v <— a set of min{c?G^(n), k/2} neighbors of v in G^ such that the number of edges between B v 
and P v is maximized. 

- C v <— the component of Ge[{v} U B v U P v ] that contains v 

- Expand C v to be a connected fc-subgraph of G 

4. Output AlcHKG) the densest C v for v £ V \ 14 


In the above algorithm, B v can be found in 0(m + n log n) time, and v is the “hub” vertex ensuring that 
C v is connected. Hence the algorithm is correct, and runs in 0{mn + n 2 logn) time, where Step 2 finishes 
in 0(n 2 logn) time. The key point here is that C v contains all edges between B v and P v , where B v and P v 
are not necessarily disjoint. Using a similar analysis to that in m , we obtain the following. 


Lemma 3.6. //fc<|n, then ct(AlcSJG)) > 


(*UG)-2*) 2 fc-2 > 
2 max{fc,2dh,} k — 


Gl(G)~ 2a) 2 

6 ma x.{k,2dh } ' 


Proof. From Lemma 3.3 of }l3l we know that Gt contains a fc-subgraph, denoted as H, with average degree at 
least cr* k {G) — 2d. Note that the number of length-2 walks within H is at least k(a/.(G) — 2d) 2 . This is because 
each v £ V{H) contributes (dniv)) 2 to this number, and Ylve v(H)( d H{v )) 2 > k(al(G) - 2cr ) 2 by convexity. 
It follows that there is a vertex v £ V ( H ) which is the endpoint of at least a number (a/ (G) — 2cr ) 2 of lcngth-2 
walks in H. By the construction of P v , there are at least (<r£ (G) — 2 ct ) 2 • ^’Z 2 walks of length 2 between this 
vertex v and vertices in P v . Therefore, the number of edges between B v and P v is at least —— if 


d Giiv) < k/2 , and at least 


(al(G)-2aY (fc-2) k/2 

d G f (v) 


2k 


edges otherwise. Since we do not require P v and B v to be 


disjoint, each edge may have been counted twice. Notice from the definition of dh that dc e {v) < dc(v) < dh- 


4 k 


Since C v contains all edges between B v and P v , it contains at least min{ 
edges. This guarantees cr(AL(^jG)) > 2 max 424 } ' Since k > 3, the lemma follows. 


(aUG)-2*) 2 (k- 2) (al{G)-2S) 2 {k-2) 
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We are now ready to prove that the four algorithms given above jointly guarantees an 0(n 2 / 5 ^approximation. 

Theorem 3.7. A connected k-subgraph C of G can be found in 0{mn\ogn ) time such that a k (G)/< j(C) < 

0{n 2 / 5 ). 

Proof. Let C be the densest connected fc-subgraph of G among the outputs of Algorithms!]] [4] As mentioned 
at the beginning of Section EOl it suffices to consider the case of k < n 4 ^ 5 . The connectivity of C gives 
er(G) > 1. Clearly, we may assume n> 8 , which along with k < n 4 / 5 implies k < 2n/3. By Lemmas 13.41 
EH we may assume that 


<r(G) > max 



ka*(G ) 
4n 


o Vkdh 

y/V 2 n 


(a k (G)-2a) 2 \ 
6 max{fc, 2dh} J 


If k> n 3 / 5 , then a(C) > k • a*(G)/(4n) > a*(G)/(4n 2 / 5 ) > cr*(G)/(4n 2 / 5 ). If k < n 2 / 5 , then a{C) > 1 > 
al(G)/k > a k (G)/n 2 / 5 . So we are only left with the case of n 2 / 5 < k < n 3 / 5 . 

Since cr(C) > a/y/k > ajn 3 / 10 > cr/n 2 / 5 , we may assume a < <J k {G)/ 4, and hence cr k (G) — 2a > cr k (G)/2. 
Next we use the geometric mean to prove the performance guarantee as claimed. 

In case of k > 2dh, since cr*(G) > a* k (G), we have 

^‘(G) K(g)/2) 2 \ 1/3 <7;{g) 

- m — ) -sST’ 

In case of k < 2 dh, we have 

^ dh K( g V 2 ) 2 ^ K(G)/2) 2 \ 1/5 ^ a* k (G) 

~ y 2 n 12 dh 2n 12 dh J ~ 7n 2 / 5 ’ 


where the last inequality follows from the fact that k > a k (G). 


□ 


4 Conclusion 

In Section 3, we have given four strongly polynomial time algorithms that jointly guarantee an 0(min{n 2 / 5 , n 2 /k 2 })- 
approximation for the unweighted problem DCfcSP. The approximation ratio is compared with the maxi¬ 
mum density of all fc-subgraphs, and in this case no 0{n}/ 3 ~ e ^approximation for any e > 0 can be expected 
(recall A > n 1 / 3 /3 in Example ll.ll alb When studying the weighted generalization - HC/cSP, we can ex¬ 
tend the techniques developed in Section EH and obtain an 0 (?r 2 /fc 2 ^approximation for the weighted case. 
Besides, the following simple greedy approach achieves a (fc/2)-approximation. 

Algorithm 5. Input: connected graph G = (V, E) with |G| > k and weight w £ Z+. Output: a connected 
k-subgraph of G, denoted as Alc J5VG1. 

1. For every v £ V, sort the neighbors of v as vi,V 2 , ■ ■ ■ ,vt such that w(vv i) > w(vv 2 ) > • • • > w(vvt), 
where t = rriin{df;(u); k — 1 } 

2. C v <-G[{v,v i,V2,...,v t }\ 

3. If \C V \ < k. then expand it to be a connected fc-subgraph 

4. Output AlcHKG) t— the heaviest C v for all v £ V 
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Notice that the weighted degree of a vertex v in any heaviest fc-subgraph of G is not greater than 
the weight of C v constructed in Algorithm [5] It is easy to see that Algorithm [5] outputs a connected 
fc-subgraph of G whose weighted density is at least 2/fc of that of the heaviest fc-subgraph of G (which 
is not necessarily connected). The running time is bottlenecked by the sorting at Step 1 which takes 
0(|dG(n)|dog |dG(n)|) time for each v £ V. Hence the algorithm runs in 0(logn-]C„ eV r |d< 3 (n)|) = 0{m log n) 
time. As min{n 2 /fc 2 , fc} < n 2//3 , we have the following result. 

Theorem 4.1. For any connected graph G = (V,E) with weight w £ Z+, a connected k-subgraph H of G 
can be found in 0{nm ) time such that a%.(G,w)/a(H,w) < 0(min{n 2 / 3 , n 2 /k 2 , fc}), where a(H,w) is the 
weighted density of H, and a(,(G,w) is the weighted density of a heaviest k-subgraph of G (which is not 
necessarily connected). 

Since the weighted density of a graph is not necessarily related to its number of edges or vertices, a 
couple of the results in the previous sections (such as Lemmas 12.4113.51 and 13.61) do not hold for the general 
weighted case. Neither the techniques of extending unweighted case approximations to weighted cases in 
na na apply to our setting due to the connectivity constraint. An immediate question is whether an 
0(n 2 / 5 )-approximation algorithm exists for HCfcSP. Note from A w > n 1 / 2 /2 in Example Ob) that no 
one can achieve an 0(n 1 ^ 2-E )-approximation for any e > 0 if she/he compares the solution value with 
the maximum weighted density of all fc-subgraphs. Among other algorithmic approaches, analyzing the 
properties of densest/heaviest connected fc-subgraphs is an important and challenging task in obtaining 
improved approximation ratios for DCfcSP and HCfcSP. 
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